[V1] Add KV cache group dimension to block table #12086

heheda12345 · 2025-01-15T15:48:23Z

This PR adds the KV cache group (KVCacheConfig.groups introduced by #11960) dimension to block table, to prepare for supporting allocating different blocks to different layers.

It is splitted from #11938 and is a preparation for #11382

Benchmark result ($\Delta$ is compared with main branch df450aa, shows the minimum time of 3 runs, H100):
commands:

VLLM_USE_V1=1 python3 -m vllm.entrypoints.openai.api_server --port 8888 --disable-log-requests [--model facebook/opt-125m, --model meta-llama/Llama-3.2-1B-Instruct, --model meta-llama/Llama-3.1-8B-Instruct]
python $VLLM_DIR/benchmarks/benchmark_serving.py --port 8888 --ignore-eos --seed 555 --dataset-name sharegpt --dataset-path /data/zhang-chen/dataset/ShareGPT_V3_unfiltered_cleaned_split.json [--request-rate 70 --model facebook/opt-125m, --request-rate 70 --model meta-llama/Llama-3.2-1B-Instruct, --request-rate 30 --model meta-llama/Llama-3.1-8B-Instruct] --percentile_metrics='ttft,tpot,itl,e2el'

Model             E2EL(ms)   ∆E2EL(%)    ITL(ms)    ∆ITL(%)   TPOT(ms)   ∆TPOT(%)   TTFT(ms)   ∆TTFT(%)
opt-125m            437.16     +1.10%       1.87     +1.08%       1.90     +1.60%      14.92     -0.13%
llama-1b            796.35     +0.61%       3.91     +0.51%       4.00     -1.23%      24.43     -0.41%
llama-8b           2652.95     +0.00%      13.23     +0.00%      13.36     +0.00%      40.08     +1.34%

It shows that adding the group dimension does not introduce much overhead.

Signed-off-by: Chen Zhang <[email protected]>

github-actions · 2025-01-15T15:48:37Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Chen Zhang <[email protected]>

…k_table

Signed-off-by: Chen Zhang <[email protected]>

…k_table

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 · 2025-01-20T07:07:23Z

vllm/v1/worker/gpu_model_runner.py

@@ -70,9 +70,7 @@ def __init__(

        self.is_multimodal_model = model_config.is_multimodal_model
        self.sliding_window = model_config.get_sliding_window()
-        self.block_size = cache_config.block_size


get block_size from the KVCacheSpec of each group instead.

Signed-off-by: Chen Zhang <[email protected]>

mergify · 2025-01-22T05:38:17Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…k_table Signed-off-by: Chen Zhang <[email protected]>

heheda12345 added 5 commits January 15, 2025 01:10

can run

0f8a54c

Signed-off-by: Chen Zhang <[email protected]>

fix tests

990d086

Signed-off-by: Chen Zhang <[email protected]>

format

e46fff5

Signed-off-by: Chen Zhang <[email protected]>

fix bug

36a649a

Signed-off-by: Chen Zhang <[email protected]>

add comments

9c36e7d

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 requested review from WoosukKwon, njhill, ywang96 and comaniac as code owners January 15, 2025 15:48

format

da6b549

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 marked this pull request as draft January 15, 2025 15:53

heheda12345 added 5 commits January 17, 2025 07:23

Merge branch 'main' of github.com:vllm-project/vllm into grouped_bloc…

4030199

…k_table

Merge branch 'main' of github.com:vllm-project/vllm into grouped_bloc…

2d8213e

…k_table

update code

41bc571

Signed-off-by: Chen Zhang <[email protected]>

Merge branch 'main' of github.com:vllm-project/vllm into grouped_bloc…

a939b6d

…k_table

can run

34c9d74

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 commented Jan 20, 2025

View reviewed changes

update comments

cfcf2b4

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 marked this pull request as ready for review January 20, 2025 08:18

heheda12345 changed the title ~~[V1][WIP] Add KV cache group dimension to block table~~ [V1] Add KV cache group dimension to block table Jan 22, 2025

mergify bot added the needs-rebase label Jan 22, 2025

Merge branch 'main' of github.com:vllm-project/vllm into grouped_bloc…

99de9f8

…k_table Signed-off-by: Chen Zhang <[email protected]>

heheda12345 requested review from robertgshaw2-redhat and alexm-redhat as code owners January 22, 2025 05:42

mergify bot removed the needs-rebase label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] Add KV cache group dimension to block table #12086

[V1] Add KV cache group dimension to block table #12086

heheda12345 commented Jan 15, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 15, 2025

heheda12345 Jan 20, 2025

mergify bot commented Jan 22, 2025

[V1] Add KV cache group dimension to block table #12086

Are you sure you want to change the base?

[V1] Add KV cache group dimension to block table #12086

Conversation

heheda12345 commented Jan 15, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 15, 2025

heheda12345 Jan 20, 2025

Choose a reason for hiding this comment

mergify bot commented Jan 22, 2025

heheda12345 commented Jan 15, 2025 •

edited by github-actions bot

Loading